Sound resynthesis from Auditory Mellin Image using STRAIGHT

نویسندگان

  • T. Irino
  • R. D. Patterson
  • H. Kawahara
چکیده

We propose an Auditory VOCODER to resynthesize sound from the Auditory Mellin Image which is an auditory representation that segregates the size and shape information of incoming sound. The sound resynthesis part consists of three techniques: the STRAIGHT VOCODER [2], frequency-warping cepstral analysis [4,12], and nonlinear multivariate regression analysis (MRA). We explain these methods and the evaluation of the system. The initial listening tests indicate that the sound quality is reasonable. The auditory components enhance the noise suppression and stream segregation performance during speech processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Auditory Vocoder Resynthesis of Speech from an Auditory Mellin Representation

An auditory Mellin transform has been proposed to segregate information about the size and shape of the vocal tract automatically; the process is also independent of glottal pitch. In this paper, we describe a method for resynthesizing speech from the Mellin representation using a high quality vocoder (STRAIGHT), and a nonlinear function to map between the two representations of speech. This en...

متن کامل

Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size

We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...

متن کامل

Extracting Size and Shape Information of Sound Source in an Optimal Auditory Processing Model

We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...

متن کامل

Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform

We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal t...

متن کامل

Multi-frame Super Resolution for Improving Vehicle Licence Plate Recognition

License plate recognition (LPR) by digital image processing, which is widely used in traffic monitor and control, is one of the most important goals in Intelligent Transportation System (ITS). In real ITS, the resolution of input images are not very high since technology challenges and cost of high resolution cameras. However, when the license plate image is taken at low resolution, the license...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001